intermediate feature map
Semantic Image Synthesis with Unconditional Generator
Semantic image synthesis (SIS) aims to generate realistic images according to semantic masks given by a user. Although recent methods produce high quality results with fine spatial control, SIS requires expensive pixel-level annotation of the training images. On the other hand, manipulating intermediate feature maps in a pretrained unconditional generator such as StyleGAN supports coarse spatial control without heavy annotation. In this paper, we introduce a new approach, for reflecting user's detailed guiding masks on a pretrained unconditional generator. Our method converts a user's guiding mask to a proxy mask through a semantic mapper.
RISC-V Based TinyML Accelerator for Depthwise Separable Convolutions in Edge AI
Yildirim, Muhammed, Ozturk, Ozcan
Abstract--The increasing demand for on-device intelligence in Edge AI and TinyML applications requires the efficient execution of modern Convolutional Neural Networks (CNNs). While lightweight architectures like MobileNetV2 employ Depth-wise Separable Convolutions (DSC) to reduce computational complexity, their multi-stage design introduces a critical performance bottleneck inherent to layer-by-layer execution: the high energy and latency cost of transferring intermediate feature maps to either large on-chip buffers or off-chip DRAM. T o address this memory wall, this paper introduces a novel hardware accelerator architecture that utilizes a fused pixel-wise dataflow. Implemented as a Custom Function Unit (CFU) for a RISC-V processor, our architecture eliminates the need for intermediate buffers entirely, reducing the data movement up to 87% compared to conventional layer-by-layer execution. It computes a single output pixel to completion across all DSC stages-expansion, depthwise convolution, and projection-by streaming data through a tightly-coupled pipeline without writing to memory. Evaluated on a Xilinx Artix-7 FPGA, our design achieves a speedup of up to 59.3x over the baseline software execution on the RISC-V core. Furthermore, ASIC synthesis projects a compact 0.284 mm This work confirms the feasibility of a zero-buffer dataflow within a TinyML resource envelope, offering a novel and effective strategy for overcoming the memory wall in edge AI accelerators. Edge AI[1] involves running artificial intelligence algorithms directly on local hardware, such as sensors and Internet of Things (IoT) units, bringing computation to the source of data creation. This allows for real-time processing without constant reliance on the cloud, an approach that offers several key benefits: low latency due to local processing, enhanced privacy by keeping sensitive data on the device, and reduced network bandwidth consumption, which enables reliable of-fline operation.[2] A critical subfield of this domain is Tiny Machine Learning (TinyML)[3], which specifically focuses on deploying machine learning models directly onto low-cost, ultra-low-power microcontrollers (MCUs) and embedded systems. These devices operate under severe constraints, often with power budgets in the milliwatt range and with only a few hundred kilobytes of memory, making on-device intelligence a significant technical challenge. The typical TinyML workflow involves taking a fully trained model and optimizing it for on-device inference by applying techniques such as quantization and pruning to create a smaller, more efficient model in a compact format.
- Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
- Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
- Asia > Middle East > Republic of Türkiye > Ankara Province > Ankara (0.04)
P-TAME: Explain Any Image Classifier with Trained Perturbations
Ntrougkas, Mariano V., Mezaris, Vasileios, Patras, Ioannis
The adoption of Deep Neural Networks (DNNs) in critical fields where predictions need to be accompanied by justifications is hindered by their inherent black-box nature. In this paper, we introduce P-TAME (Perturbation-based Trainable Attention Mechanism for Explanations), a model-agnostic method for explaining DNN-based image classifiers. P-TAME employs an auxiliary image classifier to extract features from the input image, bypassing the need to tailor the explanation method to the internal architecture of the backbone classifier being explained. Unlike traditional perturbation-based methods, which have high computational requirements, P-TAME offers an efficient alternative by generating high-resolution explanations in a single forward pass during inference. We apply P-TAME to explain the decisions of VGG-16, ResNet-50, and ViT-B-16, three distinct and widely used image classifiers. Quantitative and qualitative results show that our method matches or outperforms previous explainability methods, including model-specific approaches. Code and trained models will be released upon acceptance.
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
- (2 more...)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Rethinking Encoder-Decoder Flow Through Shared Structures
Laboyrie, Frederik, Yucel, Mehmet Kerim, Saa-Garriga, Albert
Dense prediction tasks have enjoyed a growing complexity of encoder architectures, decoders, however, have remained largely the same. They rely on individual blocks decoding intermediate feature maps sequentially. We introduce banks, shared structures that are used by each decoding block to provide additional context in the decoding process. These structures, through applying them via resampling and feature fusion, improve performance on depth estimation for state-of-the-art transformer-based architectures on natural and synthetic images whilst training on large-scale datasets.
- Information Technology > Sensing and Signal Processing > Image Processing (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.49)
- Information Technology > Artificial Intelligence > Vision > Image Understanding (0.37)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.35)
Semantic Image Synthesis with Unconditional Generator
Semantic image synthesis (SIS) aims to generate realistic images according to semantic masks given by a user. Although recent methods produce high quality results with fine spatial control, SIS requires expensive pixel-level annotation of the training images. On the other hand, manipulating intermediate feature maps in a pretrained unconditional generator such as StyleGAN supports coarse spatial control without heavy annotation. In this paper, we introduce a new approach, for reflecting user's detailed guiding masks on a pretrained unconditional generator. Our method converts a user's guiding mask to a proxy mask through a semantic mapper.
Debias the Training of Diffusion Models
Yu, Hu, Shen, Li, Huang, Jie, Zhou, Man, Li, Hongsheng, Zhao, Feng
Diffusion models have demonstrated compelling generation quality by optimizing the variational lower bound through a simple denoising score matching loss. In this paper, we provide theoretical evidence that the prevailing practice of using a constant loss weight strategy in diffusion models leads to biased estimation during the training phase. Simply optimizing the denoising network to predict Gaussian noise with constant weighting may hinder precise estimations of original images. To address the issue, we propose an elegant and effective weighting strategy grounded in the theoretically unbiased principle. Moreover, we conduct a comprehensive and systematic exploration to dissect the inherent bias problem deriving from constant weighting loss from the perspectives of its existence, impact and reasons. These analyses are expected to advance our understanding and demystify the inner workings of diffusion models. Through empirical evaluation, we demonstrate that our proposed debiased estimation method significantly enhances sample quality without the reliance on complex techniques, and exhibits improved efficiency compared to the baseline method both in training and sampling processes. Diffusion models (Sohl-Dickstein et al., 2015; Ho et al., 2020) have emerged as powerful generative models that garner significant attention recently. Their popularity stems from the remarkable ability to generate diverse and high-quality samples (Dhariwal & Nichol, 2021; Rombach et al., 2022; Ramesh et al., 2022; Nichol & Dhariwal, 2021) as well as the training-stable loss form, compared to the adversarial training paradigms used in Generative Adversarial Networks (GANs) (Goodfellow et al., 2014).
Privacy-Preserving Inference in Machine Learning Services Using Trusted Execution Environments
Narra, Krishna Giri, Lin, Zhifeng, Wang, Yongqin, Balasubramaniam, Keshav, Annavaram, Murali
This work presents Origami, which provides privacy-preserving inference for large deep neural network (DNN) models through a combination of enclave execution, cryptographic blinding, interspersed with accelerator-based computation. Origami partitions the ML model into multiple partitions. The first partition receives the encrypted user input within an SGX enclave. The enclave decrypts the input and then applies cryptographic blinding to the input data and the model parameters. Cryptographic blinding is a technique that adds noise to obfuscate data. Origami sends the obfuscated data for computation to an untrusted GPU/CPU. The blinding and de-blinding factors are kept private by the SGX enclave, thereby preventing any adversary from denoising the data, when the computation is offloaded to a GPU/CPU. The computed output is returned to the enclave, which decodes the computation on noisy data using the unblinding factors privately stored within SGX. This process may be repeated for each DNN layer, as has been done in prior work Slalom. However, the overhead of blinding and unblinding the data is a limiting factor to scalability. Origami relies on the empirical observation that the feature maps after the first several layers can not be used, even by a powerful conditional GAN adversary to reconstruct input. Hence, Origami dynamically switches to executing the rest of the DNN layers directly on an accelerator without needing any further cryptographic blinding intervention to preserve privacy. We empirically demonstrate that using Origami, a conditional GAN adversary, even with an unlimited inference budget, cannot reconstruct the input. We implement and demonstrate the performance gains of Origami using the VGG-16 and VGG-19 models. Compared to running the entire VGG-19 model within SGX, Origami inference improves the performance of private inference from 11x while using Slalom to 15.1x.
- North America > United States > California > Alameda County > Berkeley (0.04)
- North America > United States > Maryland > Baltimore (0.04)
- Information Technology > Security & Privacy (1.00)
- Education (0.86)
- Health & Medicine (0.68)
Enhancing Adversarial Example Transferability with an Intermediate Level Attack
Huang, Qian, Katsman, Isay, He, Horace, Gu, Zeqi, Belongie, Serge, Lim, Ser-Nam
Neural networks are vulnerable to adversarial examples, malicious inputs crafted to fool trained models. Adversarial examples often exhibit black-box transfer, meaning that adversarial examples for one model can fool another model. However, adversarial examples are typically overfit to exploit the particular architecture and feature representation of a source model, resulting in sub-optimal black-box transfer attacks to other target models. We introduce the Intermediate Level Attack (ILA), which attempts to fine-tune an existing adversarial example for greater black-box transferability by increasing its perturbation on a pre-specified layer of the source model, improving upon state-of-the-art methods. We show that we can select a layer of the source model to perturb without any knowledge of the target models while achieving high transferability. Additionally, we provide some explanatory insights regarding our method and the effect of optimizing for adversarial examples in intermediate feature maps.
- Transportation (0.76)
- Information Technology > Security & Privacy (0.48)
Deep multiscale convolutional feature learning for weakly supervised localization of chest pathologies in X-ray images
Sedai, Suman, Mahapatra, Dwarikanath, Ge, Zongyuan, Chakravorty, Rajib, Garnavi, Rahil
Localization of chest pathologies in chest X-ray images is a challenging task because of their varying sizes and appearances. We propose a novel weakly supervised method to localize chest pathologies using class aware deep multiscale feature learning. Our method leverages intermediate feature maps from CNN layers at different stages of a deep network during the training of a classification model using image level annotations of pathologies. During the training phase, a set of \emph{layer relevance weights} are learned for each pathology class and the CNN is optimized to perform pathology classification by convex combination of feature maps from both shallow and deep layers using the learned weights. During the test phase, to localize the predicted pathology, the multiscale attention map is obtained by convex combination of class activation maps from each stage using the \emph{layer relevance weights} learned during the training phase. We have validated our method using 112000 X-ray images and compared with the state-of-the-art localization methods. We experimentally demonstrate that the proposed weakly supervised method can improve the localization performance of small pathologies such as nodule and mass while giving comparable performance for bigger pathologies e.g., Cardiomegaly
- Oceania > Australia > Victoria > Melbourne (0.04)
- North America > United States > Montana (0.04)